Merging Meeting Times (Practice Interview Question)

You're out of free questions.

Your company built an in-house calendar tool called HiCal. You want to add a feature to see the times in a day when everyone is available.

To do this, you’ll need to know when any team is having a meeting. In HiCal, a meeting is stored as a tuple ↴

A tuple is like a list:

  (17, 3, "My name is Parker")

Python 2.7

(Tuples are written with parentheses to differentiate them from lists.)

Like lists, tuples are ordered and you can access elements by their index:

  cake_tuple = ('angel', 'bundt')

cake_tuple[0]
# returns: 'angel'

Python 2.7

But tuples are immutable! They can't be edited after they're created.

  cake_tuple = ('angel', 'bundt')
cake_tuple[1] = 'carrot'
# raises: TypeError: 'tuple' object does not support item assignment

Python 2.7

Tuples can have any number of elements (the 'tu' in tuple doesn't mean 'two', it's just a generic name taken from words like 'septuple' and 'octuple').

  (90, 4, 54)
(True, False, True, True, False)

Python 2.7

of integers (start_time, end_time). These integers represent the number of 30-minute blocks past 9:00am.

For example:

  (2, 3)  # Meeting from 10:00 – 10:30 am
(6, 9)  # Meeting from 12:00 – 1:30 pm

Write a function merge_ranges() that takes a list of multiple meeting time ranges and returns a list of condensed ranges.

For example, given:

  [(0, 1), (3, 5), (4, 8), (10, 12), (9, 10)]

your function would return:

  [(0, 1), (3, 8), (9, 12)]

Do not assume the meetings are in order. The meeting times are coming from multiple teams.

Write a solution that's efficient even when we can't put a nice upper bound on the numbers representing our time ranges. Here we've simplified our times down to the number of 30-minute slots past 9:00 am. But we want the function to work even for very large numbers, like Unix timestamps. In any case, the spirit of the challenge is to merge meetings where start_time and end_time don't have an upper bound.

Gotchas

Look at this case:

  [(1, 2), (2, 3)]

These meetings should probably be merged, although they don't exactly "overlap"—they just "touch." Does your function do this?

Look at this case:

  [(1, 5), (2, 3)]

Notice that although the second meeting starts later, it ends before the first meeting ends. Does your function correctly handle the case where a later meeting is "subsumed by" an earlier meeting?

Look at this case:

  [(1, 10), (2, 6), (3, 5), (7, 9)]

Here all of our meetings should be merged together into just (1, 10). We need keep in mind that after we've merged the first two we're not done with the result—the result of that merge may itself need to be merged into other meetings as well.

Make sure that your function won't "leave out" the last meeting.

We can do this in $O(n\lg{n})$ time.

Breakdown

What if we only had two ranges? Let's take:

  [(1, 3), (2, 4)]

These meetings clearly overlap, so we should merge them to give:

  [(1, 4)]

But how did we know that these meetings overlap?

We could tell the meetings overlapped because the end time of the first one was after the start time of the second one! But our ideas of "first" and "second" are important here—this only works after we ensure that we treat the meeting that starts earlier as the "first" one.

How would we formalize this as an algorithm? Be sure to consider these edge cases:

The end time of the first meeting and the start time of the second meeting are equal. For example: [(1, 2), (2, 3)]
The second meeting ends before the first meeting ends. For example: [(1, 5), (2, 3)]

Here's a formal algorithm:

We treat the meeting with earlier start time as "first," and the other as "second."
If the end time of the first meeting is equal to or greater than the start time of the second meeting, we merge the two meetings into one time range. The resulting time range's start time is the first meeting's start, and its end time is the later of the two meetings' end times.
Else, we leave them separate.

So, we could compare every meeting to every other meeting in this way, merging them or leaving them separate.

Comparing all pairs of meetings would take $O(n^2)$ time. We can do better!

If we're going to beat $O(n^2)$ time, maybe we're going to get $O(n)$ time? Is there a way to do this in one pass?

It'd be great if, for each meeting, we could just try to merge it with the next meeting. But that's definitely not sufficient, because the ordering of our meetings is random. There might be a non-next meeting that the current meeting could be merged with.

What if we sorted our list of meetings by start time?

Then any meetings that could be merged would always be adjacent!

So we could sort our meetings, then walk through the sorted list and see if each meeting can be merged with the one after it.

Sorting takes $O(n\lg{n})$ time in the worst case. If we can then do the merging in one pass, that's another $O(n)$ time, for $O(n\lg{n})$ overall. That's not as good as $O(n)$ , but it's better than $O(n^2)$ .

Solution

First, we sort our input list of meetings by start time so any meetings that might need to be merged are now next to each other.

Then we walk through our sorted meetings from left to right. At each step, either:

We can merge the current meeting with the previous one, so we do.
We can't merge the current meeting with the previous one, so we know the previous meeting can't be merged with any future meetings and we throw the current meeting into merged_meetings.

  def merge_ranges(meetings):

    # Sort by start time
    sorted_meetings = sorted(meetings)

    # Initialize merged_meetings with the earliest meeting
    merged_meetings = [sorted_meetings[0]]

    for current_meeting_start, current_meeting_end in sorted_meetings[1:]:
        last_merged_meeting_start, last_merged_meeting_end = merged_meetings[-1]

        # If the current meeting overlaps with the last merged meeting, use the
        # later end time of the two
        if (current_meeting_start <= last_merged_meeting_end):
            merged_meetings[-1] = (last_merged_meeting_start,
                                   max(last_merged_meeting_end,
                                       current_meeting_end))
        else:
            # Add the current meeting since it doesn't overlap
            merged_meetings.append((current_meeting_start, current_meeting_end))

    return merged_meetings

Complexity

$O(n\lg{n})$ time and $O(n)$ space.

Even though we only walk through our list of meetings once to merge them, we sort all the meetings first, giving us a runtime of $O(n\lg{n})$ . It's worth noting that if our input were sorted, we could skip the sort and do this in $O(n)$ time!

We create a new list of merged meeting times. In the worst case, none of the meetings overlap, giving us a list identical to the input list. Thus we have a worst-case space cost of $O(n)$ .

Bonus

What if we did have an upper bound on the input values? Could we improve our runtime? Would it cost us memory?
Could we do this "in place" on the input list and save some space? What are the pros and cons of doing this in place?

What We Learned

This one arguably uses a greedy ↴

A greedy algorithm builds up a solution by choosing the option that looks the best at every step.

Say you're a cashier and need to give someone 67 cents (US) using as few coins as possible. How would you do it?

Whenever picking which coin to use, you'd take the highest-value coin you could. A quarter, another quarter, then a dime, a nickel, and finally two pennies. That's a greedy algorithm, because you're always greedily choosing the coin that covers the biggest portion of the remaining amount.

Some other places where a greedy algorithm gets you the best solution:

Trying to fit as many overlapping meetings as possible in a conference room? At each step, schedule the meeting that ends earliest.
Looking for a minimum spanning tree in a graph? At each step, greedily pick the cheapest edge that reaches a new vertex.

Careful: sometimes a greedy algorithm doesn't give you an optimal solution:

When filling a duffel bag with cakes of different weights and values, choosing the cake with the highest value per pound doesn't always produce the best haul.
To find the cheapest route visiting a set of cities, choosing to visit the cheapest city you haven't been to yet doesn't produce the cheapest overall itinerary.

Validating that a greedy strategy always gets the best answer is tricky. Either prove that the answer produced by the greedy algorithm is as good as an optimal answer, or run through a rigorous set of test cases to convince your interviewer (and yourself) that its correct.

approach as well, except this time we had to sort the list first.

How did we figure that out?

We started off trying to solve the problem in one pass, and we noticed that it wouldn't work. We then noticed the reason it wouldn't work: to see if a given meeting can be merged, we have to look at all the other meetings! That's because the order of the meetings is random.

That's what got us thinking: what if the list were sorted? We saw that then a greedy approach would work. We had to spend $O(n\lg{n})$ time on sorting the list, but it was better than our initial brute force approach, which cost us $O(n^2)$ time!

Do you have an answer?

Show answer

I have an answer Tell me more

Wanna review this one again later? Or do you feel like you got it all?

Mark as done Pin for review later

You're out of free questions.

Gotchas

Breakdown

Solution

Complexity

Bonus

What We Learned

Programming interview questions by company:

Programming interview questions by topic: